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f^ l' The efficiency of two Bayesian order estimators is studied. By 

■^C . using nonparametric techniques, we prove new underestimation and 

overestimation bounds. The results apply to various models, includ- 
ing mixture models. In this case, the errors are shown to be 0(e~"") 
and 0((logn)''/y'n) (a,fe>0), respectively. 

H 

C/^ ■ 1. Introduction. Order identification deals witii the estimation and test 

r~| . of a structural parameter which indexes the complexity of a model. In other 

words, the most economical representation of a random phenomenon is 

sought. This problem is encountered in many situations, including: mixture 

models [13, 19] with an unknown number of components; cluster analysis [9], 

when the number of clusters is unknown; autoregressive models [1], when 

the process memory is not known. 

QQ ■ This paper is devoted to the study of two Bayesian estimators of the 

VO . order of a model. Frequentist properties of efficiency are particularly inves- 

^ \ tigated. We obtain new efficiency bounds under mild assumptions, providing 

a theoretical answer to the questions raised, for instance, in [7] (see their 

^ ■ Section 4). 

o ■ 
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X. 

H , 0OO = Ufe>i 0fe s-iid for every 6 G @oo, let fo be the density of the probability 

measure Pg with respect to the measure /i. 

The order of any distribution Pg^ is the unique integer k such that Pg^ £ 
{Pg :6 £ Qk\ 0fc-i} (with convention Qq = 0). It is assumed that the dis- 
tribution P* of Zi belongs to {Pg -.O G ©oo}- The density of P* is denoted 



1.1. Description of the problem. We observe n i.i.d. random variables 
(r.v.) (Zi, . . . , Zn) = Z"^ with values in a measured sample space (Z, .F, (i). 

Let {Qk)k>i be an increasing family of nested parametric sets and d the 
Euclidean distance on each. The dimension of Gfc is denoted by D{k). Let 



Received November 2005; revised May 2007. 
AMS 2000 subject classifications. 62F05, 62F12, 62G05, 62G10. 

Key words and phrases. Mixture, model selection, nonparametric Bayesian inference, 
order estimation, rate of convergence. 



This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics^ 

2008, Vol. 36, No. 2, 938-962. This reprint differs from the original in pagination 

and typographic detail. 

i 



2 A. CHAMBAZ AND J. ROUSSEAU 

by /* = fe* {6* G Qk* \ 0a,-*-i)- The order of P* is denoted by k*, and is the 
quantity of interest here. 

We are interested in frequentist properties of two Bayesian estimates of 
k*. In that perspective, the problem can be restated as an issue of composite 
hypotheses testing (see [4]), where the quantities of interest are P*{kn < k*} 
and P*{kn > k*}, the under- and over-estimation errors, respectively. In this 
paper we determine upper-bounds on both errors on A;„ defined as follows. 

Let n be a prior on Goo that writes as cni{6) = 'jr{k)TTk{0) d9, for all 6 &Qk 
and k >1. We denote by Il{k\Z^) the posterior probability of each k >1. 
In a Bayesian decision theoretic perspective, the Bayes estimator associated 
with the 0-1 loss function is the mode of the posterior distribution of the 
order k: 

A;^ = argmax{n(A;|Z")}. 
fc>i 

It is a global estimator. Following a more local and sequential approach, we 
propose another estimator: 

fc^ = inf{A; > 1 : n(fc|Z") > n(fc + 1|Z")} < k^. 

If the posterior distribution on k is unimodal, then obviously both estimators 
are equal. The advantage of k^ over A;„ is that /c^ does not require the 
computation of the whole posterior distribution on k. It can also be slightly 
modified into the smallest integer k such that the Bayes factor comparing 
&k+i to Gfc is less than one. When considering a model comparison point 
of view, Bayes factors are often used to compare two models; see [11]. In 
the following, we shall focus on k^ and A;^, since the sequential Bayes factor 
estimator shares the same properties as A:„. 

1.2. Results in perspective. In this paper we prove that the underesti- 
mation errors are 0(e~"") (some a > 0); see Theorem 1. We also show that 
the overestimation errors are 0{{log7i)^/n'^) (some b>0, c > 0); see Theo- 
rems 2 and 3. All constants can be expressed explicitly, even though they 
are quite complicated. We apply these results in a regression model and in a 
change points problem. Finally, we show that our results apply to the impor- 
tant class of mixture models. Mixture models have interesting nonregularity 
properties and, in particular, even though the mixing distribution is identi- 
fiable, testing on the order of the model has proved to be difficult; see, for 
instance, [6]. There, we obtain an underestimation error of order 0{e~°'^) 
and an overestimation error of order 0{{\ogn) / y/n) (6 > 0); see Theorem 4. 

Efficiency issues in the order estimation problem have been studied mainly 
in the frequentist literature; see [4] for a review on these results. There is 
an extensive literature on Bayesian estimation of mixture models and, in 
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particular, on the order selection in mixture models. However, this liter- 
ature is essentially devoted to determining coherent noninformative priors 
(see, e.g., [15]) and to implementation (see, e.g., [14]). To the best of our 
knowledge, there is hardly any work on frequentist properties of Bayesian 
estimators such as k^ and kll outside the regular case. In the case of mix- 
ture models, Ishwaran, James and San [10] suggest a Bayesian estimator of 
the mixing distribution when the number of components is unknown and 
bounded and study the asymptotic properties of the mixing distribution. It 
is to be noted that deriving rates of convergence for the order of the model 
from those of the mixing distribution would be suboptimal since the mixing 
distribution converges at a rate at most equal to n~^'^ to be compared to 
our 0{{logn) / ^/n) (6 > 0) in Theorem 4. 

1.3. Organization of the paper. In Section 2 we state our main results. 
General bounds are presented in Sections 2.1 (underestimation) and 2.2 
(overestimation) . The regression and change points examples are treated in 
Section 2.3. We deal with mixture models in Section 2.4. The main proofs 
are gathered in Section 3 (underestimation). Section 4 (overestimation) and 
Section 5 (examples). Section C in the Appendix is devoted to an aspect of 
mixture models which might be of interest in its own. 

2. Efficiency bounds. Hereafter, the integral J f dX of a function / with 
respect to a measure A is written as A/. 

Let -^+(/x) be the subset of all nonnegative functions in L^(/i). For every 
/ € -^+(^) \ {0}, the measure Pj is defined by its derivative / with respect 
to /x. For every /,/' G L^(/i), we set F(/,/') = P/(log/ - logf'f [with 
convention V{f,f') = oo whenever necessary]. 

Let t = log /*. For all 9, 0' G 600, we set £9 = log fe and define H{e, 9') = 
Pe {^9 — ^9' ) when Pq <^ Pgi (00 otherwise) , the Kullback-Leibler divergence 
between Pe and Pq'. We also set H{9)=H{9*,9) (each 9 G Goo)- 

Let us define, for every /c > 1, a, 5 > and t £ Qf^, 9 £ Goo, 

lt,5 = mf{fe' : 9' G Qfc, d(t, 9') < S}, ut,s = snp{fe' : 9' G Gk,d{t, 9') < 6}, 
HI = mf{H{9') : 9' G Q^}, Sk{d) = {9' G G^ : H{9') < Hi + 5/2], 

q{9, a) = P*{e - 4)2e"(^*-^») + V{e\ £e) G [0, 00]. 

Throughout this paper we suppose that the following standard conditions 
are satisfied: for every /c > 1, {Qk,d) is compact and 9 1— > £g{z) from G^ to 
M is continuous for every z £ Z. By definition of k* , we have H^. = for all 
k>k* and H^. > otherwise. 

We consider now two assumptions that are useful for controlling the un- 
derestimation and overestimation errors. 
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Al. For each k>l, there exist a,5o > 0, M > 1 such that, for all 5 G (0,(^o]) 

su-p{q{e,a):eeSk{6)}<M. 
A2. For every k>l and 9 G 0^, there exists ijg > such that 

V{ue,r,, , n + Vir, k,ve ) + Vif, ue,vo ) + ^("e,^« , fe) < oo. 

Assumption Al states the existence of some (rather than any) exponen- 
tial moment for log ratios of densities {i* — £g) for 6 ranging over some 
neighborhood of 6* and was also considered in [4]. 

2.1. Underestimation. We first deal with the underestimation errors. 

Theorem 1. Assume that Al and A2 are satisfied and that iTkiSki^)} > 
for all 6 > and k = l,. . . ,k* . 

(i) There exist c'^,C2 > such that, for every n>l, 

(1) P*"{fc^<F}<c;e-""'2. 

(ii) If, in addition, H^ > H^j^i for k = 1,. . . ,k* — 1, then there exist 
ci , C2 > such that, for every n > 1 , 

(2) P*"{:fc^ ^ r} < cie""^^ 

The proof of Theorem 1 is postponed to Section 3. 

According to (1) and (2), both underestimation probabilities decay expo- 
nentially quickly. This is the best achievable rate. This comes from a variant 
of the Stein lemma (see Theorem 2.1 in [2] and Lemma 3 in [4]). 

Values of constants ci,Ci,C2,C2 can be found in the proof of Theorem 1. 
Evaluating them is difficult [see (9) for a lower bound on C2 in the regres- 
sion model]. However, we think that they shed some light on the under- 
estimation phenomenon. It is natural to compare our underestimation ex- 
ponents C2 and C2 to the constant that appears in Stein's lemma, namely, 
inf0ge^,^_j H{9,6*). The constants do not match, which does not necessarily 

mean that k^ and k]^ are not optimal. We refer to [4] for a discussion about 
optimality. 

2.2. Overestimation. Let the largest integer which is strictly smaller 
than a G M be denoted by [a\ . For simplicity, let a\/ b and a A 6 be the max- 
imum and minimum of a, 6 G M, and V{e) = V{f*, fe) V Vife, /*) {9 G Goo). 
It is crucial in our study of overestimation errors that, if Al is satisfied and 
Ci = 5(1 -|- log M)/2a^, then (following Lemma 5 and Theorem 5 in [20]) 
for all A; > A;'' and 6* G Gfc, H{e) < e'^ yields 

(3) V{e)<CiH{e)log^H{9). 
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Let us now introduce further notions and assumptions. Given 6 > and 
two functions I <u, the bracket [l,u] is the set of all functions / with / < 
f <u. We say that [l,u\ is a 5-bracket ii l,u£ L^ifJ-) and 

^i{u-l)<6, P*(log'u-log0^<52, 
P„_;(logii-log/*)^ < (^log^(5 and ^^(logn - logZ)^ < (^log^(5. 

For C a class of functions, the 5-entropy with bracketing of C is the logarithm 
£(C,6) of the minimum number of 5-brackets needed to cover C. A set of 
cardinality exp(£{C,6)) of (^-brackets which covers C is written as TC{C,5). 

For all 9 € Ooo we introduce the following quantities: iniO) = J27=i ^ei^i), 
^n = Er=i ^*iZ,) and, for every k>l, Mn{k) = ^{k) J^^ e^-^^)"^- d7Tki9). Ob- 
viously, a k < k' are two integers, then A;„ = A; yields B„(A;) > ]B„(A; + 1) and 
k^ = k implies that B„,(/c) > B„(A;'). 

Let K > k* he an integer. We consider the following three assumptions: 

01{K). There exist C2,Di{k) > {k = k* + I,. . . ,K) such that, for every 
sequence {6n} decreasing to 0, for all n > 1, and all k G {k* + 1, . . . , K}, 



TTfc 



{eeek:H{6)<6n}<C26n'''^'''>/^. 



02{K). There exists C3 > such that, for each k G {k* + 1, . . . , K}, there 
exists a sequence {.F^}, J^n "^ ©fc; such that, for all n > 1, 

03. There exist (3i,L, D2{k*) > 0, and /32 > such that, for ah n > 1, 

When 03 holds, let no be the smallest integer n such that 

(4) (5o = 4max{m~Mog[/3i(logm)^2j„^2(fc*)/2]| <g-2y2. 

When Ol(Er) and 03 hold with D2{k*) < inmk*<:k<K Di{k), given any s > 0, 
we set (5fc „ = 6k^in~^ log n for all n > 2, A; G {/c* + 1, . . . , ET}, with 

(5) 6k,i > 128(1 + s){Ci + 2)[£)i(A:) - -D2(A:*)] V l28CiDi{k) V log^^no. 

We control the overestimation error for k^ when a prior bound A:max on 
k* is known. 

Theorem 2. If the prior IT puis mass 1 on ljfc<fc Qfe fJ'^^ ^/^* ^ ^max; 
if Al, A2, 01(A:inax), 02(A;max) o-iT'd 03 are satisfied with D2{k*) < 
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minfc*<fc<fc^,, Di{k), if, in addition, for every k£{k* + l,..., fcmax}, for all 
integers n > tiq such that 5k^n < ^o (^iT-d for every j < [5o/(5fc,nJ , 

(6) £ (tI n [5fc(2(i+i)5, „) \ 5fc(2i5, „)], ^) < ^/(l + ^)".?^'^fe." 
i/ien i/iere exists C3 > stic/i t/iai, /or a// n > no, 

(^) ^*"{^n > k*} < C's ^mmfe[Di(fc)-D2(fc*)]/2 " 

In the formula above index k ranges between k* + 1 and /cmax- 

On the contrary, the fohowing resuh on the overestimation error of k}^ 
does not rely on a prior bound on k*. 

Theorem 3. Let k = k* + 1. Let us suppose that assumptions Al, A2, 
01(/c), 02{k) and 03 are satisfied with D2{k*) < Di{k). If, in addition, for 
all integers n > no such that 5k,n < 5q and for every j < [(5o/(5fc,nJ , equation 
(6) is satisfied, then there exists C3 > such that, for all n>no, 

floe:n)3'Ci(fc*+i)/2+/32 
(8) P*"{fcL>r}<C3>^^ 



n 



Di{k*+l)-D2{k*)]/2 



Proofs of Theorems 2 and 3 rely on tests of P* versus complements 
{Pe:9 £ @k,H{9) > e} of Kullback-Leibler balls around P* for k > k*, in 
the spirit of [8]. They are postponed to Section 4. The upper bounds we get 
in the proofs are actually tighter than the one stated in the theorems. Each 
time, we actually chose the largest of several terms to make the formulas 
more readable. Besides, the possibility in Theorem 3 to tune the value of Sk,i 
makes it easier to apply the theorem to the mixture model example. Natu- 
rally, the larger 6k,i, the larger C3 and the less accurate the overestimation 
bound. 

Concerning condition (6), it warrants that (a critical region of) Qk is not 
too large, since the entropy is known to quantify the complexity of a model. 

Assumption 01 is concerned with the decay to of the prior mass of 
shrinking Kullback-Leibler neighborhoods of 9*. Verifying this assumption 
in the mixture setting is a demanding task; see Section 2.4. Note that dimen- 
sional indices Di(k) (k > k*) are introduced, which might be different from 
the usual dimensions D{k). They should be understood as effective dimen- 
sions of Gfc relative to Gfc* . In models of mixtures of gry densities (7 G F C 
M^), for instance, Di{k* + 1) = I?(fc*) + 1, while D{k* + 1) = D{k*) + (d+ 1). 
It is to be noted that this assumption is crucial. In particular, in the differ- 
ent context of [16], it is proved that if such a condition is not satisfied, then 
some inconsistency occurs for the Bayes factor. 
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Finally, 03 is milder than the existence of a Laplace expansion of the 
marginal likelihood (which holds in "regular models" as described in [18]), 
since in such cases (see [18]), for c as large as need be, denoting by J„ the 
Jacobian matrix, there exist 6,C > such that 

Mnik^ > I „ e^"W-^"Wcivr,.(^) > f ^)''^'*^^Vn|-^/'(l + Op(l/n)), 

and P*"{| J„| + |Op(l/n)| > C} < n-^ implying 03 with A > 0, /?2 = 
and D2{k*) = D{k*). In some cases however, dimensional index D2{k*) may 
differ from D{k*)] see, for instance. Lemma 1. 

According to (7) and (8), both overestimation errors decay as a negative 
power of the sample size n (up to a power of a logn factor). Note that 
the overestimation rate is necessarily slower than exponential, as stated in 
another variant of the Stein Lemma (see Lemma 3 in [4] ) . 

We want to emphasize that the overestimation rates obtained in Theo- 
rems 2 and 3 depend on intrinsic quantities [such as dimensions Di{k) and 
D2{k*), power /32]. On the contrary, the rates obtained in Theorems 10 and 
11 of [4] depend directly on the choice of a penalty term. 

2.3. Regression and change points models. Theorems 1, 2 and 3 (resp. 1 and 3) 
apply to the following regression (resp. change points) model. In the rest of 
this section, a > is given, g^ is the density of the Gaussian distribution with 
mean 7 and variance o"^; Xi, . . . ,Xn are i.i.d. and uniformly distributed on 
[0, 1], ei, . . . , e„ are i.i.d. with density go and independent from Xi,. . . , Xn- 
Moreover, one observes Zi = {Xi,Yi) with Yi = (pQ* {Xi) + ej (i = 1, . . . , n), 
where the definition of 999* depends on the example. 

Regression (see also Section 5.3 of [4]). Let {tk}k>i be a uniformly bounded 
system of continuous functions on [0, 1] forming an orthonormal system in 
L^([0, 1]) (for the Lebesgue measure). Let F be a compact subset of M that 
contains and G^ = F'^ (each A; > 1). For every G 0^, set ipQ = ^'j=i Ojtj 
and feiz) = g^g(x){y) [all z = {x,y)e [0,1] xM]. 

Change points. For each A; > 1, let 7^ be the set of (A; + l)-tuples {tj)o<:j<:ki 
with to = 0, tj < tj^i (all j < k), and t^ = 1. Let F be a compact subset of 
M and Qk = % x V'^ (each k > 1). For every 6 = {a,t) £ 6^, set ifg{x) = 
E^=iajl{tj^i<x<tj}, and fe{z) = g^^^^){y) (all z = (x,y) G [0, 1] x M). 

In both examples there exists 6* G Qk* \ 0fc*-i such that /* = fg*. The 
standard conditions of compactness and continuous parameterization are 
fulfilled, and Al and A2 are satisfied. Besides, 2a'^H{9) = \\ipQ — (p*\\2 (all 
^ G ©oo)) so the additional condition stated in Theorem l(ii) holds. Conse- 
quently, if vTfc is positive on ©^ for each k>l, then Theorem 1 applies. In 
particular, using Fourier basis in the regression model, we get 



/ 1 Afc+i 2^=* 



(9) I2c2>l/max -^ + -^ + (1 + A^h 
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where A^+i = {ei_,^y^Y!f=k+2{^^,? if A; + 1< fc* and A^* = 0. 

Also, it can be shown that there exists r > 1 such that [^6»,6/T;^6i,(5/r] 
is a (5-bracket for all 9 £ Q^ and 5 sufficiently small. Consequently, with 
the notation of Theorems 2 and 3, and with T!^ = 0^ (02 is then triv- 
ial), £{Qk-,j5k,n/^) < —b^og{j5k^n) + c for positive 6, c, and we show in Ap- 
pendix D how this implies the desired condition on entropy. 

The regression model is regular (as described in [18]), so 03 holds with 
D2{k*) = D{k*). Moreover, the form of H{6) makes it easy to verify that 
0\{K) is satisfied for any K > k* with Di = D. Thus, Theorems 2 and 
3 apply too. Furthermore, Theorem 3 applies in the change points model 
because, for any r E (0, |) (see Appendix A for a proof), 

Lemma 1. In the change points model, 01(A;* + 1) and 03 hold with 
Di{k* + I) = D{k*) + k* , D2{k^) = D{k*) + k*-l + 2T and (32 = 0. 

Actually, the proof of Lemma 1 can easily be adapted to yield that 01{K) 
holds for any K > k* with Di{K) = D{k*) + K — 1 (we omit the details for 
the sake of conciseness). So Theorem 2 also applies in that model. 

2.4. Mixture models. We prove that Theorems 1 and 3 apply here with 
Di [k* + 1) = D(k*) + 1 and Z?2 (^*) = D{k*), yielding an overestimation rate 
of order 0{{logn)'^/^/n) for some positive c. 

We denote by | • |i and | • I2 the i^ and £'^ norms on M'^. Let T be a 
compact subset of M'^ and A = {g = ((71, . . . ,gk) G F'^ : mmj^jt \gj — gj'\2 = 0}. 
For all 7 S F, let g.y be a density. In this section mixtures of ^f-y's are studied. 
Formally, 0i = F and for every k>2, 

ek = le={p,-r):p = {pi,...,p,^i)€Ri-\J2pj<hier''\. 

Every OeQk {k>2) is associated with fo = T,'jZlPj d-yj + (1 - Ej=iPi)57fe- 
Note that D{k) = k{d+ 1) — 1 for each k>l. Also, the standard conditions 
of compactness and continuous parameterization are fulfilled. 

We consider the following six assumptions which will be used in the mix- 
ture case. The first-, second- and third-order differentiation (with respect to 
7) operators are denoted by V, D^ and D^, and | • | stands for any norm on 
the space of second and third-order derivatives. We say that a function is 
C^ if it is k times continuously differ entiable: 

Ml. For each k>l, prior vr^ writes as d-Kk{0) = T^^{p)'^'k{l) '^P d-1 [^'ll ^ = 
(p,7) E Gfc]. It is C^ over Q^- Moreover, there exist e,C > such that, 
setting Aj = {7 e Fiinfg^A I7 — g|i > ^j, 7 G A^ yields 7r^(7) > C, and 
when d=l, ir'^i'y) oc llj<j' llj - Tj'b upon A^. 
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M2. For all 7 G r,ry > 0, let us define g = mf{(?y : I7 — 7'|i < rj} and 

g^,^ = supjgy : l7~7'li ^v)- There exist ?7i,M > such that, for every 
7i) 72 G r, there exists % > such that 

^^.i..i -£,,„, l°g' 571,.! ^ ^^^1 log' ^1 
and 

P-g,^ „, (log' 9^1 ,m + log' 572 ) + Pg,, (log' 571 ,m + log' £7, ,,1 ) ^ M- 
M3. For every 71,72 G F, there exists a > such that 

sup{P^i {g^^/diT : 7 G r} < 00. 
M4. The parameterization 7 h-> (7^(2;) is C^ for /^-almost every z ^ Z. More- 
over, A*[sup^gr(|V57li + I-^'s'tI)] is finite. 

The parameterization 7 1-^ log 5^(2;) is C^ for /^-almost every z ^ Z 
and for every 7 G F, the Fisher information matrix 1(7) is positive 
definite. Besides, for all 71,72 S F, there exists r? > for which 
P^J-D^ log c,^j2 ^ p^^ sup{|Z?3log5^|2 : I7 - 72I1 < ??} < 00. 

M5. Let T = {(r, s) : 1 < r < s < d}. There exist a nonempty subset ^ of T 
and two constants r/o, a > such that, for every k>2, for every /c-tuple 
(71, . . . , 7;^.) of pairwise distinct elements of F: 

(a) functions 5^., {'Vg^)i (j <k,l<d) are linearly independent; 

(b) for every j <k, functions g^. , {Vg^. )i, (D'^g^j )rs (all / < d, (r, s) G 
A) are linearly independent; 

(c) for each j < k, (r, s) (^T\A, there exist A^^, . . . , Xf^ E M such that 

{D^g,^)rs = x'^ig,, + Eti Kii'^gi.h 

(d) for all rj < 770 and all u,v G W^, for each j < k, if 



{r,s)fA 



Y^ {\UrUs\ + \VrVs\) + 
{r,s)€A 

then \u\2 + \v\2 < ar]- 
These assumptions suffice to guarantee the bounds below. 



< 



Theorem 4. // M1-M5 are satisfied, then there exists ni > 1 and C4 > 
such that, for all n>ni, 

(10) P*{fe^ < A;*} < cie-"'=2, 

(11) P^{kli > r} < c ^'°^''> . 

The positive constants c\ , C2 are defined in Theorem 1 . 
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Note that all assumptions involve the mixed densities g^ (7 G T) rather 
than the resulting mixture densities fg {9 E 0oo)- Assumption M2 implies 
A2 and M3 implies Al. Assumption M4 is a usual regularity condition. 
Assumption M5 is a weaker version of the strong identifiability condition 
defined by [5], which is assumed in most paper dealing with asymptotic 
properties of mixtures. In particular, strong identifiability does not hold in 
location-scale mixtures of Gaussian r.v., but M5 does (with A = X\{(1, 1)}). 
In fact, Theorem 4 applies, and we have the following: 

Corollary 1. Set A,B>Q and T = {{^i,a'^) G [-A,A] x [^,B]}. For 
every 7 = (/i,o"^) G T , let us denote by g^ the Gaussian density with mean fi 
and variance a"^ . Then (10) and (11) hold with d=2 for all n > uq. 

Other examples include, for instance, mixtures of Gamma(a, b) in a or in 
b [but not in {a,b)], of Beta(a,6) in {a,b), of GIG(a,6, c) in (6, c) (another 
example where strong identifiability does not hold, but M5 does). 

3. Underestimation proofs. Let us start with new notation. For /, /' G 
L\{^) \{0}, we set H{f,f') = Pj (log / — log/') when it is defined (00 other- 
wise), H{f) = H{rj), and V{f) = V{r, f)V V{f, f*). For every 0G 600, 
the following shortcuts will be used {W stands for H or V): W{f,fg) = 
W{f,0), WifeJ) = W{e,f), Wife) = W{e). For every probability density 
/ G L^ilj), Pf^ is denoted by PV" and the expectation with respect to Pf 
(resp. P'f) by E/ (resp. Ej). 

Theorem 1 relies on the following lower bound on Mn{k)- 

Lemma 2. Let k < k* and 6 G {0,aM A 6q]. Under the assumptions of 
Theorem 1, with probability at least 1 — 2exp{— n(5^/8M}, 

B„(A:)>^^^^MM)ie-[^.*+^]. 

Proof. Let l<k<k*,0<5< aM A 60 and define 

B = {{9, Z") G e^ X Z^:£n{9) - C > -n[Hl + <5]}. 
Then, using the same calculations as in Lemma 1 of [17], we obtain 

,12, p...k(s,wni.^>-iM))}</ '-!!^,^^m. 

I 2 J Jsk{&) T^k{ok{o)\ 

Set s G [0,a] and 9 G Sk{6) and let ipe{t) = P^e^i^*'^"^ (every t G M). By 
virtue of Al, function ipg is C°° over [0, a] and 99^' is bounded by q{9, a) <M 
on that interval. Moreover, a Taylor expansion implies that 

ipQ{s) = 1 + sH{9) + s^ / (1 - t)c/?e(st) dt<l + sH{9) + \s'^M. 

Jo 
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Applying the Chernoff method and inequahty log t <t — 1 (t > 0) implies 
that, for all OGSkiS), 

P^'^iB^} < exp{-ns[Hl + S] + nlogipeis)} 

< exp{-ns[H^ + 6- H{6)] + ns^M/2}. 

We choose s = [H"^ + 6 — H{9)]/M G [5/2, a] so that the above probability is 
bounded by exp{— nJ^/8M} and Lemma 2 is proved. D 

To prove Theorem 1, we construct nets of upper bounds for the /g's {6 £ 
Qk, k = 1, . . . ,k* — 1). Similar nets have been first introduced in a context 
of nonparametric Bayesian estimation in [3]. We focus on fc^; the proof for 
k^ is a straightforward adaptation. 

Proof of Theorem 1. Since P*"{a;^ < k*} = J^IJI^ ^*"{fcn = k}, it 
is sufficient to study P*"'{kl^ = k} for k between 1 and k* — 1. 

Let 5<aM A6oA[H^- H^+i]/2, c = ^Tr{k)TTk{Sk{S)} £ (0, 1] and e = 
26 /[HI - H*^^] G (0, 1). Lemma 2 yields 

P'^"{feL = A;} < P*''{Mnik) > Mnik + 1)} 
(13) 

< 2e-"'5V(8M) ^ p^n{B„(A;) > ce""[^'^+i+'^] }. 

We now study the rightmost term of (13). Let 6,6' E 0^. The domi- 
nated convergence theorem and A2 ensure that there exists ?76i > such that 
d{6,6') < rje yields H{ug,^) < H{e') < H{ue,^) + 5, V{e\ue,r,) < (1 + e)V{6\6') 
and V{ue^r„ 0*) < (1 + £^{9', 6*). Let B{d, rig) = {9' G 9^ : d{9, 9') < r?4 for 
all 9 £ Qk- The collection of open sets {13{6,7]0)}g^Q^ covers 0^, which is a 
compact set. So, there exist 6i, . . . , 9]^^ E 0^ such that 0^ = Uj=i ^{^j^Vdj)- 
For j = 1, . . . , iVe, letting Uj = ug^^rig . , 



3 



Tkj = {9£Qk:e9< \ogUj,H{9) < H{u,) + 6, 

Vi6\uj)<il + e)Vi6\6),Viu„9*)<il+e)Vi6,6'')}, 

then Tfci = f^i and Tkj = fkj n (Uj'<j?fcj')'' (j = 2,...,Ns). The family 
{Tfci, . . . , TfcTv.} is a partition of 0^. 

Accordingly, with in,uj = J27=i^°SUjiZi) {j = I, . . . ,Ne), the rightmost 
term of (13) is smaller than 

e^"(^)-^"d^fc(0)>ce-"[^^Vi+^]| 
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< E ^""{^n,., - C > -nm^, + 6]+ log c} 

< E P*"{4,^., - ^: + n/?(n,) > n/5, + log c} 
i=i 

for y9j = [H{uj) - Hl^^ - 6]. Note that pj > (1 - e)[H{ej) - H^^^] > for 
J = 1, . . . , A'g by construction. Applying (29) of Proposition B.l (whose as- 
sumptions are satisfied) finally implies that 

P'^"{]B„(fc)>ce-"[^^*+i+'51} 

- c ^\ 2(l + e)^ ^ ^+1^ We, V{e) 'l-ejj 
We conclude, since N^ does not depend on n. D 

4. Overestimation proofs. We choose again to focus on k}^, the proof for 
k^ being very similar. 

Proof of Theorem 3. Set no and 5o as in (4), then note that ut-^ 
ulog u increases on interval (0, e~^). By definition of A;„, 

P*"{^^ > k"} < P*"{B„(F) < Mn{k* + 1)} 
(14) < P^"{B„(F) < (;3i(logn)^2n^2(fc*)/2)-i| 

+ P^"{B„(r + l)>(/3i(logn)^2n^2(^")/2)-i}. 

Assumption 03 deals with the first term of the right-hand side of (14). 
Let us focus on the second one. To this end, ©fc*+i is decomposed into the 
following three sets: letting 5i satisfy (5) and 5n = (^in~^log n, 

Sk^+i{25or = {0 G e,*+i : H{e) > 5^}, 

Sn = Sk*+i{26o) n Sk*+i{25nr = {9€ 6^*+! : 5„ < Hi9) < 6o}, 

Sk*+i{2Sn) = {9e ek*+i-H{9) < 6n}. 

Note that Sn can be empty. According to this decomposition, the quantity 
of interest is bounded by the sum of three terms (the second one is when 
Sn is empty): if Wn = 37r{k* + l)/3i(logn)^2^^2(fc*)/2^ ^i^^^ 

P^"{B„(r + 1) > (/3i(logn)'32^^2(fc*)/2)-i| 

< P-^"! / e^"(^)-^" dvrfc.+i(0) > 1/wn 

(15) 
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USfe*+i(25„) J 

The Markov inequality, Fubini theorem and 01 yield (as in the proof of 
Lemma 2) the following bound on the third term, Pn.s, of (15): 

Pn,3 < zz;„7rfc*+i{5fe*+i(25„)} < C2u;„<5„^i('^*+i)/2 
(16) 

\3Di{fc*+l)/2+/32 



<3AC..(r + l)J,-.(-«)/.(i|^ 



The first term of (15), p„,i, is like P*"{B„(A;) > ce""t-^fe*+i+^l}, already 
bounded in the proof of Theorem 1. Indeed, the infima for Q G S'fc*+i(2(5o)'^ 
of H{9), V{9*, 6) and V{0, 9*) are positive and the scheme of proof of Theo- 
rem 1 also applies here: there exist C4,C5 > which do not depend on n and 
guarantee that 

(17) Pn,i<C4e-"^^ 

When 5n < So, bounding the second term of (15), Pn,2-, goes in four steps. 

Let A„ = \5o/5n\ ■ For all j = 1, . . . , A„, let Sn,j = {(9 G J^„ n S„ : j5„ < 
H{9) < (j + l)Sn}- Consider [li,Ui] £ TC{Sn,j,jSn/4:), define Uj = Ui/fxui and 
introduce the local tests 

(j)ij = tUn,u, -C + nH{ui) > n— ^ \ = (pnj,p,c 

for / = Ui, p = j6n/'^ and c= 1 in the perspective of Proposition B.l. 

Step 1. Set 6 £ Snj such that fg G [li,Ui], g = fg and p' = log /iUj. Then 
Hg = 1, V{g) = V{e) > and H{ui)- {p + p') = P*{t -\ogu^)-logfiu,- p = 
P*{e* -logui) - p = H{e) + P^le -logu,) - p> H{e) - P'{logUi-logli) - 
p > ^ > 0. Thus, according to (30) of Proposition B.l, 

Egd - ^.,) < exp{- "l^'"-' - '^ + ^'" ( ^'"'V-^^ + "'' A 1 

Since i7(0) < (j + 1)(5„ < 26o < e~^, then log^^^ > log^(i5„) and (3) yield 
V{0) < CiH{e) log2 Hie) < Ciij + 1)5„ log2(jA„). consequently, j/ij + 1) > 
1/2 and 8Cilog^{j6n) > 1 imply 

^iep 2. Proposition B.l and (29) ensure that 

T?*ni^ / "^^^^ ( i^^ . , 
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The point is now to bound V{ui). Let again 6 G Snj be such that fe G [li,Ui]. 
Using repeatedly (a + b)^ < 2{a? + 6^) (a, 6 G R), the definition of a J-bracket 
and (3) yield 

V{d\ui) = P*{t - \ogu, + log ^iUif 

< 2P*{e - loguif + 21ogV"i 

(19) < AP*{t - £ef + 4P*{ee - logu,)' + 2{fi{u, - h)f 

< W{e) + 4P'^(logni - log/,)' + 2(//(tx, - /,))' 
<2{2Ci + 3){j + l)6nlog\j6n), 

and similarly, 

(20) F(n„ r) < 4(Ci + 2)(i + l)6n log\j6n). 

A bound on V{ui) is derived from (19) and (20), which yields in turn 

^ ' ''- I 64(Ci + 2) log2 (j<^,)j 

Step 3. Now, consider the global test 

(pn = max{0ij : i < exp{£:(5„j, jA„/4)}, j < A„}. 
Equation (18) implies that, for every j < A„, and 9 G Snj, 

nj5n 



(22) E^(l-,^„)<exp|- 



64Cilog\j6n) 



Furthermore, if we set /)„ = n(5„/[64(l + s){Ci + 2) log 6n], then bounding 
<j)n by the sum of all (pij, invoking (21) and (6) yield 

^" ' nj6n 



E">„<2exp|^(5„,i,J<5„/4) 



,=1 ^ — ...,.-,., -. 64(c, + 2)log2(j<5„ 

^ V^ r • T ^ exp{-p„} 
jr[ 1 - exp{-/)„} 

Since 5i > 128(1 + s)(Ci + 2)[Di(A;* + 1) - £'2(A;*)] V log-3(no), one has 
log^(5n < 41og^n, and p„ > l[Di{k* + 1) - L'2(A;*)] logn. Thus, the final 
bound is 

(23) E*"(/.„ < 



„[ni(fc*+l)-D2(fc*)]/2 _ ]^- 
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Step 4. We now bound Pn,2'- 

Pn,2 = E*"(<^„ + (1 - </-«)) l|^ e^"(^)~^" d^fc*+l(^) > l/Wnj 
< E*>„ + P*"( / e^"(^)-^" d7Tk*+lie) > l/2lUn] 

+ E-'(l - 0„)l( / e^"(^)-^" d7rfc*+i(^) > 1/2^4- 

The first term of the right-hand side is bounded according to (23). Moreover, 
applying the Markov inequahty and Fubini theorem to the second term 
above, Pn,2,2, ensures that 

(24) Pn,2,2<6P,C,- (^°S")'^ 



^[Di(fc*+l)-Z)2(fc*)]/2- 

As for the third term, Pn,2,3i invoking again the Markov inequahty and Fubini 
theorem, then (22), yields 



Pn,2,3 <2WnJ2 ^^(l " K) d7rk*+l{9) 

A 1 •'On -)■ 



j=l-JS„ 



< 2..gexp{-^^J^^^" }vr,.^,{5„,} 



.^^ . 64C7ilog^(i5„) 

64cn^7^/-'""^"n-^^i 

(logn)^^ 



< 2Wn exp \ — ^ ^^ 1 " 2 c- r — ^^" ^^P 1 ~ nrrgA^ 1°S "" 



Combining inequalities (23), (24) and (25) yields 



1 (logn)^2 

Inequalities (16), (17) and the one above conclude the proof. D 



5. Mixtures proofs. In the sequel we use the notation 6* = (p*, 7*), p* 

.j=i Pj- 



(pt,...,Pfe*_i) and pl. = l- ET=i^p1- Also, if 9 = (p,7) G 6^, then 1 



Ej=iPi is denoted by pk- 

The standard conditions hold. Assumption Al is verified by proving (with 
usual regularity and convexity arguments) the existence of a > such that 
the function 9 ^^ P*e°^ ~ ^' is bounded on Qk*- Assumption A2 follows 
from M2. Lemma 3 in [12] guarantees that H^ > H^,^ (every k < k*). 
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So, the underestimation error bound (10) in Theorem 4 is a consequence of 
Theorem 1. 

The overestimation error bound (11) in Theorem 4 is a consequence of 
Theorem 3. Let us verify that 01(A:* + 1), 02{k* + 1) and 03 are satisfied. 

Proposition 1. There exists C2 > such that, in the setting of mixture 
models, for every sequence {Sn} that decreases to 0, for all n>l, 

Proposition 2. // :rf +1 = {(p,7) G ek*+i:mmj<k*+iPj > e"''} ap- 
proximates the set Gfc*+i, then 02(A;* + 1) is fulfilled. Furthermore, the en- 
tropy condition (6) holds as soon as 5i is chosen large enough. 

The technical proofs of Propositions 1 and 2 are postponed to Appendix C 
and D, respectively. Assumption 03 is obtained (with (32 = 0) from the 
Laplace expansion under P* , which is regular (see also the comment after 
Theorem 3). Finally, Theorem 3 applies and Theorem 4 is proven. 

APPENDIX A: PROOF OF LEMMA 1 

Let 9* = {a*,t*) and 6 G ©fc*4-i satisfy H{9) < 5„. For every j < k* (resp. 
j < k), we denote by t* (resp. Tj) the interval [ij_i,ij[ (resp. [tj_i,tj[) 

[hence, H{9) = Y^j<k*Y^j'<k*+i{'^^ ~ o^j')"^ K'''j ^ Tj')]^ and set s{j) such 
that ^(r* n T^Q)) = maxKfc /i(r* n ti). So, ^(t* n rs(j)) > fi{T^)/k, and {a* - 
«s(j))^ !^ c6n for all j < k*. If s{j) = s{j') for j' > j, then necessarily j' > 
(j + 2) and s{j + 1) = s{j), while a* 7^ ckj+ii so we do get k* conditions 
on 9. Suppose now without loss of generality that s{j) =j for all j < k* . 
Then (a^ — a^t)^(l — tk*) < Sn, another condition on 9. Moreover, for all 
j < k\ ^i{t*) - fi{T* n Tj) = /i(r; n Tj_i) + /^(r; n r^+i), fi{Tj) - fi{T* n r,-) = 
fi{Tj n T? 1) + /-i(Tj- n Tj_^i) (with convention r_i = t*:-^ = 0) and a^ / "^j+i 
imply |/x(r)-/x(r;nrj)| <c5„ for r G {r/.r,}. So, |(t}- 1^) - (t*„i -tj_i)| < 
2c(5„. Using successively these inequalities from j = 1 to j = {k* — 1), we get 
{k* — 1) conditions on 9 of the form {fj — tj\ < cdn- Combining those condi- 
tions yields 01{k* + 1) with Di{k* + 1) = D{k^) + k* . 

Let Sn = {t* + u/n:u£ M++\mo = Uk* =0, |u|i < -^loglogn} C Tfe*. For 
large n, there exists an event of probability 1 — (1 — min^ |i^ — 1^_^|/2)" upon 
which the model is regular in a for any fixed t£Sn, hence, there exists C > 
(independent of t) such that, on that event, 

(25) / e'^-W-^" dn,.{a\t) > ^e^-^"*'*)"^- > -g-e^"(°*'*)-^", 
Jrfe* n'^ I rr 1^ 

where at is the maximum likelihood estimator for fixed t. Denote nj{t) = 

YTi=\ l{^i G [*j,*j + %/«[} and f2(t) = cj2 Ej=i(aj - a![„i)2nj(i) for any 
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t £ Sn- Then ^(t) = 4(a*,t) - ^* + ^v'^it) is, conditionally on Xi, . . .,Xn, 
a centered Gaussian r.v. with variance v'^{t). Because each nj{t) is Binomi- 
al{n,Uj/n) distributed, the Chernoff method implies, for any t G Sn, 

(26) P*''{v^{t)>T\ogn} = 0{l/^/^). 

Moreover, since S,{t) is conditionally Gaussian, it is easily seen by using (26) 
that, for any t£Sn, setting S = {Z" :4(a*,t) - ^* > -^(^^(t) + rlogn)}, 

(27) P*"{S^} = 0(1/V^), 

too. Now, the same technique as in the proof of Lemma 2 yields 

(28)P-(/ e^"("^*)-^"dv^,.(t)<n-'=*+l-^l</ ^^!^dvrfc.(t) 

whenever 7rfc*{S'„} = c(loglogn/n) "^ > 2n~ ~'"^. By combining (25), (27) 
and (28), we obtain that 03 holds with D2{k*) = 3k* + 2(r - 1). 

APPENDIX B: CONSTRUCTION OF TESTS 

Proposition B . 1 . Let {p,c) belong to M;^ x (0, 1] and f e L^{fi) \ {0} . 
Assume that V{f) is positive and finite. Let £n,f = J27=i^'^Sf{Zi) o.'^d 

(pnj,p,c = 1{4,/ -in + nH{f) >np + log c}. 
The following hound holds: 

(29) ^*"'^«'/.^.^^r"p{"?(^^^ 

Let p' G IR_|_ and g £ L\^{p) he such that pg = 1, g < e^ , f and V{g) is 
finite. If, in addition, [p + p') <H{f), then the following hound holds true: 

CM)) F"n ^ ^<e.n/ ^[H{f) - jp + p')] f H{f) - jp + p') 
[SU) Eg ( 1 - (pnj,p,c) < exp I I — — A 1 

Proof. H{f) is also finite. Let us denote log/ by if, logg by ig and 
set sG(0,1]. Then 

cE*^(l>nj,p,c = cP^'^i^nj - il >np- nH{f) + logc} 
A Taylor expansion of the function tH-> p*et[e.f-i ) jj^iplies that 

P^^s{ii-n = i_sH{f) + s'^ j (1-t) f{f*)^-'*f'\t-iffdpdt 
< 1 - sH{f) + s^V{f\ ff-''v{f, fy'/2 

<l-sH{f) + s^V{f)/2, 
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by a Holder inequality with parameters 1/ st and 1/(1 — st). Moreover, since 
logt < t — 1 (t > 0), we have 

cE*"0„j,^,c<exp[-nsp + ?isV(/)/2]. 
The choice s = 1 A yfry yields (29). Similarly, for all s G (0, 1], 
E^(l - Kj,p,c) < P^K - 4,/ > n[H{f) - p]} 

<P;{C-^n,9>n[i:f(/)-(p + p')]} 

< g-ns[H{f)-{p+p')]^p ^sie*-eg)-^n^ 

The same arguments as before lead to PgS^^ ~^' < 1 + s'^V{g)/2 and 
E^(l - <Pnj,p,c) < exp{-ns[H{f) -{p + p')] + ns^V{g)/2}. 

The choice s = 1 A "^\[^)^^'^ yields (30). D 

APPENDIX C: PROOF OF PROPOSITION 1 

Let {5n} be a decreasing sequence of positive numbers which tend to 0. 
Let us denote by || • || the L^{p) norm. Because ^/H(e)> ||/* - /e||/2. Ml 
ensures that Proposition 1 holds if 

(31) 7rfc*+i{0 G Ofc^+i : Wf - fe\\ < vC} < Ca^^:''^"*^^' 

for some C2 > which does not depend on {(5„}. We use a new parameteri- 
zation for translating ||/* — /ell < \/5^ in terms of parameters p and 7. It is 
a variant of the locally conic parameterization [6], using the L^ norm instead 
of the L? norm. In the sequel, c, C will be generic positive constants. 

L^ locally conic parameterization. For each 9 = (p,7) G int(0fc*4.i), we 
define iteratively the permutation ag upon {1, . . . , /c* + 1} as follows: 

• (ji,o'9(ii)) = min(jj/)argmin{|7* - 7j'|i :j < fc*,/ < /c* + 1}, where the 
first minimum is for the lexicographic ranking; 

• if (Ji,cre(ji)),...,(ji„i,cr0(ji_i)) with / < k* have been defined, then (j;, 
(^eiJi)) = ™in(j,j') argmin{|7^ ~ 7j'|i}i where in the argmin, index j < k* 
does not belong to {ji, . . . , j^-i} and index / <k* + 1 does not belong to 
{o-0(ji),...,cr0(j/_i)}; 

• once (ji, 0-0 (ji )),..., (jfc*, ere (jfc*)) are defined, the value of ae{k* + 1) is 
uniquely determined. 

We can assume without loss of generality that ae = id, the identity per- 
mutation over {!,... ,k* + 1}. Indeed, for every 6 = (p,7) G 0fe*+i and 
each permutation <; onto {l,...,fc* -|- 1}, let 6'^ = (p^,7^) G 0fc*+i be the 
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parameter with coordinates p^ = p^(j), 7? = 7<j(j) (all j < k* + 1) and set 
Since for all d and ?, WP - fe\\ = Wf - fe4, 



(32) 



7rfc*+i{^Ge 



k*+l 



fell < v^} 



: J2 4*+i{^ e ®k*+i ■■ <ye = id, ||r - Ml < v^}, 



where the sum above is on all possible permutations. 

We show below that the term in the sum above associated with <;" = id is 
bounded by a constant times \f5^, . The proof involves only proper- 

ties that all 7r^*_|_i share. Studying the latter term is therefore sufficient to 
conclude that Proposition 1 holds. 

Set e* = {6* E Gfc*+i : (Jq = id}. For ah 9 £ G*, let 70 = jk*+i, Pe = Pk*+i 
and Rg = {pi, . . . ,pk*-i,ri, . . . ,rk*), where 

Pj-P*j , _7j-7| 



Pj 



and 



{J<k*). 



Pe Pe 

Note that J2j<k* Pj — ~^- Now, define 

fc* k* 

9ie + Y.p'fl^a^* + Yl Pi Si* 



N{je,Re) 
then tg=peN{-fe,Rg). 



Lemma C.l. For all 6 £ Q* , let ^(6*) = {tg,jg,R0). The function ^ is 
a bijection between 0* and ^(0*). Furthermore, T = supg^Q* tg is finite, so 
that the projection 0/ ^(G*) along its first coordinate is included in [0,r]. 
Finally, for all e > 0, there exists r] > such that, for every E G*, ||/* — 
fe\\<ri yields tg <e. 

Proof. It is readily seen that ^ is a bijection. We point out that 
N{^,R) is necessarily positive for all {t,j,R) £ ^(G*), by virtue of M5. 
As for the finiteness of T, note that, for any 9 £@*, 

k* k* 

te = Peg^e +Y^P%li - ljf^9i* + Y.'^Pj - P*j)9i* 
^ ^ j=i i=i 

(33) ^^ 

<2 + 5:p}||(7,-7;)^V5,;||. 
i=i 
The right-hand side term above is finite because T is bounded and ||(Vg^*);|| 
{j <k*,l < d) are finite thanks to M4. Hence, T is finite. 

The last part of the lemma is a straightforward consequence of the com- 
pactness of r and continuity of h-> fe{z)- □ 
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Proof of (31). For any r > 0, define the sets 
ST 
and 



31 = (^ G e^ min I7, - ^*\, > r, \\f* - /e|| < v^ 



Bl = je G e^ min \^e - T^li < r, \\r - fe\\< v^}. 
Inequality (31) is a consequence of the following: 



(34) 



Lemma C.2. Given r > 0, there exists C > such that, for all n > 1, 

fc*(d+l) 



7rfc*+i{i3[} < CV(5„ 



(35) 



Lemma C.3. There exist t,C > such that, for all n>l, 

■k*{d+l) 



7rfc*+i{S2 } < C\/6n 



Because F is compact, continuity arguments on the norm in finite dimen- 
sional spaces yield the following useful property: Under M5, if gi, . . . ,gk £ 
L^{n) are k functions such that, for every 7 € F, g^,gi, ■ ■ ■ ,gk are linearly in- 
dependent, then there exists C > such that, for all a = (ao, • • • , Ok) G W'^'^^ 
and 7 G F, 



(36) 



ao97 + Yl "-Jdj 



>cj:\ 



j=0 



Proof of Lemma C.2. Let t > 0, let {t,^,R) G ^(0*) and 6' = (p,-/) = 
^^^{t, 7, R) satisfy I70 — 7*|i > r for ah j < k* and ||/* - fe\\ < a/^- Given 
any z € Z, a, Taylor-Lagrange expansion (in t) of [f*{z) — fg{z)] yields the 
existence of t° £ (0, t) (depending on z) such that 



\f*{z)-feiz)\>- 



k* k* 

9i{z) + Yp)r]Vg^* {z) + ^ p^g^* (z) 



t^ 

N2 






where 7° = 7* + t°rj/N and p° = p* + t°pj/N (all j < A;*). Therefore, by 
virtue of M4, there exists C > such that 



(37) 



fo\\>t[l-C-, 
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Furthermore, M5 and (36) imply that, for some C > (depending on r), 

(38) 



N>c(l + J2{\p,\+p*\r,U)\ 



so the foUowing lower bound on ||/* — /ell is deduced from (37): 



(39) ||/--/,||>t 1-C 



Ej=i{\pj -p*j\+p*jhj- 7j\ 



By mimicking the last part of the proof of Lemma C.l, we obtain that the 
right-hand term in (39) is larger than t/2 for n large enough (independently 
of 9). Because t = peN and (38) holds, there exists c> such that 



TTfc* 



{k* 
9eQ*:Y.{\pj-p*\ 
.7 = 1 



+p*jhj-ij\i)<cV^ 



leading to (34) by virtue of Ml. D 



Proof of Lemma C.3. Let r > and 9 = (p,7) G Q* satisfying ||/* - 
/ell < \/^- Assume that |7e — 7^|i < t for some j < k*, say, j = 1. By con- 
struction of G*, |7i — 7i |i < |7e — 7i |i < t, and r can be chosen small enough 
so that 7e must be different from 7^ for every j = 2,. . . ,k*. We consider 
without loss of generality that 70 ^ {7^ -.j < k*}. 

Lemma C.l implies that |7j — 7^|i and \pj — p^| go to as n t 00 for every 
j = 2, . . . ,k*. This yields that \pi + pe — Pi\ goes to as n t 00. Therefore, 
by virtue of M5 and (36), there exist c, C > such that, for n large enough. 



Ck* k* 

i=2 j=2 



(40) 



+ {Pi+Pe-Pi) 

+ E ^rs[P0(7e-7i)r(7e-7i). 

+ Pl{ll-ll)r{ll-ll)s] 

+ E bel(7e-7i)r(7e-7i).| 

(r,s)Gyl 

+ Pi|(7i-7i)r(7i-7i)s|] 
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d 

+E 
1=1 



Pihi - 7i)i + Peile - li)i 

+ J2 ^rs[Pe(79-7i)r(79-7i). 

+ Pi(7i-7i)r(7i-7i)^ 



c{ Pehe - llll + Pihi -7i'^ 



?+i:i7.-7iii) 

7=2 / 



= CAi-cA2. 

Since |7j — 7*|i goes to for j = 2, ... , k*, J2j=2 \lj ~ 7j li can be neglected 
compared to J2j=2P'j\lj ~ 7j li when n is large enough. If CAi < 2cA2, then 
X]j=2 \Pj — Pjl £ 2c^2j so that \pi + pe — Pi\ < 2cA2) which yields in turn 

Y. ^rsiPeile - 7i)r(7e - 7i). +Pi(7i - 7i)r(7i - 7i).] 

{r,s)^A 

+ Yl NI(7e-7i)r(79-7i)s|+Pi|(7i-7i)r(7i-7i).|]<4cA2. 

{r,s)£A 

Consequently, M5 guarantees the existence of C" > such that 

Pehe - li\2 + Pilli -71I2 

< C'ipehe - ll\l+Pihi - 7il?), 

which is impossible when r is chosen small enough. Therefore, CAi > 2cA2 
and (40) together with M5 give 

/ k* k* 

\\r -f9\\>cij2 \Pi - pH + J2p*j\^j - 7i ii + |pi +pe-p{\ 

\j=2 j=2 

+ Pehe - itll + Pihi - lill 



d 

+E 
1=1 



Pi{li-li)i+Pe{le-li)i 

+ J2 ^rs[P0(79-7l)r(79-7l)s 

+ Pl(7l-7l)r (71-71)6 
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for some C > 0. Finally, 

k* 

\pi+Pe-Pi\+^\Pj -P*j\ +P1I71 -li\l+Pe\le-li\l 

(41) 

k* 

+ \Pi{li - lt)+Pe{le - 7i)|i + Y.P*j\^J - ^ili ^ ^^• 

i=2 

Therefore, for r small enough and n large enough, 

7rfc*+i{BJ} < TTk*+i{e G e'^ : (41) holds}. 

The conditions on pj and 7^ (j = 2, . . . , fc*) and a symmetry argument 

imply that the right-hand side term above is bounded by a constant times 

[(<i+i)(fc*_i)] 
yOn times Wni where 

Wn= t{pe > Pi}H\pi + Pe -Pi\ +P1I71 -71I2 

+ Pe\je- Jill + \Pi ill -7i) 

+ Pe(7e - 7i)|i < CV6^} dnl^.ij) d^^.+i(p). 

Note that the conditions in the integrand imply that I76) — 71I2 ^ 'iCV^./pi 
and pq > Pi/4: as soon as C\fb^ < Pi/2. Simple calculus (based on Ml) yields 
the result. 

APPENDIX D: PROOF OF PROPOSITION 2 

It is readily seen that 02(k* + 1) holds for the chosen approximating set. 
Let us focus now on the entropy condition (6). 

Constructing 6-brackets. Let 5i satisfy (5). A convenient value will be 
chosen later on. Set j' < [5o/(5nJ , £ = j'Sn/4: and r > 1. 

Let 9 = (p,7) E 0fc*+i be arbitrarily chosen. Let t] £ (0,?7i) be small 
enough so that, for every j < k* + 1, Uj =g^..^ and Vj = g (as defined in 

M2) satisfy, for ah 7 G P, Puj-vj{l + log^ 9^) < e/r, Pg-,{logUj - logVjf < 
{e/rf and Pu^-v^ fog^% < {e / T)\og^ {e / t) . 

If we define ve = {I - e/T)(Ejl|Vi^i) and ue = {l + e/ r) iJ2']=i^ PjUj), 
then there exists r > 1 (which depends only on k* and the constant M 
of M2) such that the bracket ['ye,iig] is an e-bracket. The repeated use of 
{J2j Pj'fJ'j / J2j Pj'Vj) ^ niaxj Uj/vj is the core of the proof we omit. 
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Control of the entropy. The rule xi(l — e/r) = e~" and Xj-|_i(l — e/r) = 
Xj{l + e/T) is used for defining a net for the interval (e"", 1). Such a net has 
at most [1 + n/log(l + e/r)/(l — e/r) < 1 + 2nr/e] support points. Using 
repeatedly this construction on each dimension of the {k* + l)-dimensional 
simplex yields a net for {p G M^ :minj<fc*pj > e~"',l — J2j<k*Pj ^ e""} 
with at most 0{{n/ep ^^') support points. We can choose a net for F "'"^ 
with at most 0{e"'^^^ ^^') support points such that each 7 E P'^ "'"^ is within 
I • 1 1 -distance e of some element of the net. 

Consequently, the minimum number of e-brackets needed to cover J^^ ~^^ 
is 0{n^'^ +i)y'£('^+i)('= +^)), so there exist constants a, 6, c > for which 

(42) s{r':*+\^-^^<a\ogn-h\og[3'5n) + c. 

Now, let us note that ^^g^^ > (i,g,-j£g (,.,-„) > ^^ and consider each 
term of the right-hand side of (42) in turn. It is readily seen that alogn < 
n(5n/log 5n is equivalent to 

(43) 5i>[(log3n)n(^i/")'^'-i]-\ 

Now, -b\og{i'6n) < (iog^„"io"g(jM„) if ^""^ °"ly if -^log(5.„ < iJilog^n. Since 
\o^ 5n < 41og^n, both are valid as soon as 

(44) (5i>26/log2n. 

Finally, using again log (5„ < 4 log n yields that c < n(J„/log 5n when 

(45) (5i>4c/logn. 

When 5i > a, the largest values of the right-hand sides of (43), (44) and 
(45) are achieved at no. So, 5i can be chosen large enough (independently of 
/ and n) so that (5), (43), (44) and (45) hold for all n > no and j' < [(5o/(^„J . 
This completes the proof of Proposition 2, because £{J'n ~^^ ^j'^n/^) is larger 
than the left-hand side of (6) (with f substituted to j). D 
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